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ABSTRACT 



'’’his paper reports the initial phase of a series cf 
experiments conductel on a large number of videotapes mad° for tie 
purpose of analvzino public-school classroom interaction. The 
exper imeiits originally aimed to preduct the most reliable, efficient 
and economic way of producing transcriptions which are sufficiently 
representative of the verbal events u. c a d for empirical research. 
Results tabulated thus far indicate that, particularly amonq 
nonlinouists, but also amono linguists, transcriptions of the same 
event into standard orthography are art to differ to a significant 
extent; that sene of these differences nav not be entirely 
predictable; and that it takes at least two iterations of 
post-editina cf the transcript to get a reasonable orthoqraphic 
representation of the event. It also appears that the more 
complicated the structures involved, whether they be social, semantic 
or grammatical, the more verifications or oost-edi tings are needed to 
produce an accurate transcription. The optimum work increment, 
processor personality, traininq or seauencina is no* yet 
determinable, but, especially for difficult passages, it is likely 
that pairs of iudges in the final editings, working together with 
transcript and tape, will be more efficient th^n simle fudges left 
alone with their idiosyncratic prejudices, anticipations, hearina and 
experience. (Author /AM”) 
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From Mouth to Hand: Obstacles in rendering verbal events 

faithfully into standard orthography 

(Paper presented at the annual meeting of the Linguistics Society of 
America, July 24, 1970, at Columbus, Ohio, by Harriett Mutt Hays, 

Research Associate, Center for Research in Social Behavior, University 
of Missouri, Columbia, Missouri.) 

To a phonetician or a lexicologist who has had direct experience 
in the field with variations among observers who transcribe or analyze 
oral data, either live or from electronic recordings, it is a matter of 
fact that all observers do not hear the same utterance in the same way. 
Kurath and McDavid, in Linguistic Atlas discussions, have reported 
variations among trained field workers. The Swedish dialectician 
Ringaard found that perceptions of other dialects by trained phonolo- 
gists were Influenced by their own manner of speech. Lieberman 
has noted that a great deal of the linguist's perception of prosodic 
features is based on intuition or knowledge of grammatical structures. 

1. H. Paul, a psychologist, noted some tie» ogo that listeners' recall 
of a verbal event was subject to considerable variation among observers. 
Recently* Gumperz has bemoaned the fact that it is very tedious to 
obtain o transcript of sn oral event for sociolinguistic studies. 

What emphasis there has been on observer discrepancies has 
generally concentrated on subwordal, or phonological phenomena, or 
on more abstract functional ot semantic entities. Very little has 
been done to examine the actual extent and cause of observer discrepancies 
relative to the translation of an oral situation into standard orthography. 
The assumption in many fields seems to be that a written transcription will 
not vary extensively from the real event if It concentrates on representing 
just the uttered words and sentences of that event: it is the prosodic 

information which is apt to be distorted. As we all have observed, 
transcriptions ot words) events arc sometimes used as primary data for 
that event, for research, for legal artion| for political record. 

The clue to the general ignoran:e of the difficulties involved in 
capturing what ia actually said may be reflected in the problems of 
illustrating and assessing them. It is, indeed, extremely tedious to 
represent oral information in a permanent form which is easily accessible 



for perusal, and It is even more tedious to analyze varying representa- 
tions of on event. 

Signals made simultaneously in several dimensions of one medium 
have to be telescoped into perhaps only one or two dimensions of Another. 
What is represented, for instance, in speech by the quantitative indicators 
(prosodic features) of amplitude, frequency, rate, and duration, which 
occur simultaneously with the qualitative indicators (consonants and 
vowels) are represented in writing as two-dimensional graphic signs 
(punctuations marks) which usually appear sequentially to the qualitative 
symbols (letters). The only simultaneous indicators for standard formal 
orthographic representations ore capital letters, italics, underlines, 
boldface print, .d the like. There is not a one to one ratio of the 
two symbolic s, jms, so their confusion is inherent in any translitera- 
tion. This is then compounded by the requirements of written discourse 
that all utterances be segmented by terminals which enclose strings of 
supposedly specific structures including what are referred to in standard 
school gractnar6 as 'subjects,' 'predicates' and 'complete thoughts.' 

Spoken discourse, particularly with informal style, is characterized by 
what would be considered 'fragmentation' in wiitten grammatical tradition. 
The problem of representing these 'incomplete thoughts' is difficult for 
translators who have been given no guidelines, particularly when they may 
have differing views of 'completeness' or of 'gracanatlcality . ' 

There ore other problems which accompany the conversion procedure. 
Both the media and the situations for production of speech and production 
of its transliteration are different, presenting numerous possibilities 
for distortion. Speech, which can be considered the primary data, is a 
relatively unpredictable string of events incorporating a number of 
mutually interactive sec^tic aystena among which oral gesture and other 
expressive behavior patterns are included. Transliteration of the speech 
event is much more restricted in its potential boundaries than is the 
production of that event. Yet in both situations the participants have 
to add their individual Interpretations to the events. Both are 
translation systems of a sort; speech is the translation into sound, 
apparently of thought pioduct or behavioral convention; transliteration 
is a translation of a part of the speech event into a graphic medium. 

Just as speech suffers from limitations in its signals which do not 
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represent all the elaborations of the mind, so transliteration suffers 
from a lack of signals which simultaneously reinforce, extend or contra- 
dict nuances of the verbal message. The system which the transit terator 
uses seems rather to be designed to create graphic events such as essays, 
novels, poem3 or letters than to convert an alien system to its form. 

The transit terator has the freedom of neither the author nor the 
speaker. Like the hearer and the reader, he must bring to the communica- 
tion system his own experience in order to interpret the multi-referential 
signals which are employed. But unlike the hearer and the reader, he may 
not include them overtly in his transcription. His is a translation 
problem, in which a part of the other code is systematically left out. 

The results of such a conversion, without supplementation of electronic 
recordings of the oral event may be quite unusable as primary data, even 
though they may be more valid than the recall of the event by any one 
individual or individuals who do not commit the recall to a transcription. 

This paper reports the initial phase of a series of experiments 
conducted on a large number of videotapes made for the purpose of analyzing 
public school classroom interaction. The original aim of the experiments 
was to predict the most reliable, efficient and econonic way of producing 
transcriptions which arc sufficiently representative of the verbal events 
to be used for empirical research. 

The results of work tabulated thus far indicate thAt, particularly 
among non-linguists, but also among linguists, transcriptions of the same 
event into standard orthography are apt to differ to a significant extent, 
that some of these differences may not be entirely predictable, and that 
it takes at least two iterations of post-editing of the transcript to get 
a reasonable orthographic representation of the event. It appears also 
that the more complicated are the structures involved, whether they be 
social, semantic or gramatical, the more verifications or post-editings 
are needed to produce an accurate transcription. The optimum work incre- 
ment, processor personality, training or sequencing is not yet determinable, 
but, especially for difficult passages, it is likely that pairs of judges 
in the final editings, working together with transcript and tape, will be 
more efficient than single judges left alone with their idiosyncratic 
prejudices, anticipations, hearing and experience. 
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The following discussion illustrates the kinds of omissions or 
distortions which the transliterator is apt to make when he transcribes 
on oral event using the standard literary graphic code. Note that the 
discrepancies tabulated refer only to the differences within the 
standard graphic transliteration system. They do not take account 
of the real situation, the actual amount of information of the communica- 
tion system which is preserved or lost because of deficiencies in the 
system or elsewhere. 

In order to assess the results, of course, it was necessary to 
use a preserved oral event. It was not possible at the time to make 
recordings specifically for this purpose and to have multiple observers 
on the spot judge the relative fidelity of the recorded material to 
the live situation. Such information was not extant for the recordings 
which were available, so no Judgments about their actual fidelity to 
the live situation will be pertinent for this study. 

It is pertinent, however, to know the conditions under which the 
recordings were node. Both segments of tope discussed in this study were 
recorded on the same 2400 foot roll of 2 Inch 3M videotape, in the seme 
urban elementary school, one each from two different sixth grade classes. 
Both of the teachers of these classes were young (20-30) females, who 
appear to speak the standard (prestige) teacher dialect of that Missouri 
city. The teacher of Segment I was black, of II was white. Class I was 
in English composition. Class 11 was in Social Studies, apparently a 
geography lesson in which a certain amount of reading aloud from the 
textbook took place. The pupils were male and female, black and white, 
children of approximately 12 years of age who spoke the local urban 
dialect but did not appear to have adopted so-called standard American. 

Six microphones were used. The teacher wore a microphone suspended 
about the neck. The audio channel from this microphone was recorded on 
one track of a two track recording system. Four other microphones were 
hung from the ceiling and were recorded on the other track. All these 
were supplemented by another, directional, microphone which was aimed 
at the immediate emitter source. Two cameras were installed at opposite 
points in the classroom. The teacher was kept in constant focus, and 
the teacher picture inserted into an unused portion of the general class- 
room picture. 

ERIC 
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A detailed description of the recording equipment is found in 

Biddle and Adams, 1967. The recordings used for this study were mode 
in 1968,^ tta V\ ScocA 3eVt*.y v'or J^X) rivers ^ <fr V-Uvow’i # 

C.o\uVA\V>rh e r| 0 W as n0 special attempt mode by teacher or pupils to 
enunciate or otherwise distort their behavior in order to improve the 
video and audio clarity of the recording. There were occasions of 
single as well as multiple responses by members of the class. The 
teacher was usually audible, but sometimes members of the class 
stationed far from the general microphones were difficult if not 
impossible to distinguish. Sometimes part of the class was not 
visible on the videoscreen. The lighting (unsupplemented on a rainy 
day), the distance, and the focu- of the recording were such that 
the facial expressions of the class members wore often not perceptible. 
Although a seating chart and rosters had been obtained at the time of 
recording, apparently there was no check on the actual position or 
presence of individual pupils, for neither these nor other tapes in 
the series reflect very well the arrangements indicated. (On some 
tapes there is no relationship at oil between the rosters and the 
arrangement or content of the class). As we will see, this proves 
unfortunate, and, for those researchers for whom it is not a matter 
of course to diagram recorded events for location and activity of parti- 
cipants it would be well to take note. Literally hours of weeks have 
been spent by us attempting to straighten out boys from girl), black 
faces from white ones, high voices from low, etc., with very unsatis- 
factory results. It is our experience that, if the information is not 
gethered at time of collection, it night well be permanently lost. This 
loss of essential informant information then limits the use to which 
otherwise acceptable materials might be put. 

The tape segments were arbitrarily chosen, with no pre-examination 
of the tape Itself, for the contrasts originally were to serve a9 a quick 
illustration to associates that caution needed to be exercised in inter- 
preting the oral material. The transcribers at the time were engaged in 
transcribing the tapes for the recorded series, and the next tape scheduled 
for transcription was selected for analysis. 
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Both transcribers and editors used the sane playback equipment! 
a standard CONRAC 240 1 ' monitor set, an AMPEX VR01500 portable tape 
recorder with Play, Fast-Forward and Reverse controls handling two 
channels and accommodating a Tandberg 22 footpedol with two button 
controls for Forward and Reverse. A Sharp headset connected to a switch- 
box for signals from either chnnnol singly, or both channels stereo- 
phunically, to the earphones, was used by some transcribers and editors. 
Others preferred to listen without earphones. No Qttempt was made to 
standardize or control this, but a casual survey indicates that it is 
probably more efficient to use the earphones, which seem to cut off 
transcription environment interference noise. Some persons, however, 
complained of headache from the headsets, and It was assumed that for 
them it was more efficient to work without both headache and headsets. 

A survey of business concerns regarding optimal work increment 
for persons operating dictaphones seemed to indicate that a twenty 
minute period might be optimal for transcribing. This was shocking 
to the secretaries involved who had been in the habit of spending a 
much longer increment. A compromise of about 40 rrinutes was finally 
settled upon for transcription sessions. Tine allowable for paper insert, 
forward and reversing the tapes and examination of the video image were 
assumed to constitute proper rest periods within the 40 minutes. (Examina- 
tion of manuscripts seems to indicate that transcriber efficiency decreases 
rapidly at the end of 60 minutes.) About half of the editors claimed that 
they didn’t start getting efficient until they had been working for about 
an hour, so there is certainly divided opinion on work Increment. Some 
of the editors also claimed they could do better transcriptions then the 
transcribers by working at long intervals by hand. (Thia has been a 
cotnon reaction. Most persons who have seen the lists of discrepancies 
have volunteered the information that they themselves would not make errors.) 
Since none of them turned out to be infallible, it is still unclear whether 
relative time has any bearing on the quality of output. There is no ques- 
tion about the fact that it la much o;re expensive to employ a slow working 
than a fast working editor. 

Editor is really a misnomer. The editors were not to function as 
standard literary editors do. Rather they were to reproduce an utterance 
without improving upon it. They were to compare all previous graphic 

versions against th? tape of the situation and to insert, delete or aodify 

O 
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those translations which seemed inappropriate. Similarly , the transcriber 
was to reproduce rather than beautify the original utterance. 

The regular transcribers for the Interaction study then, four 
secretaries employed by the research center, one a college graduate, 
the others with high school degrees and secretarial training, transcribed 
Class I using the conventions they had already established: All senders 

of utterances were designated in parentheses, lef t-justified on a new 
line. All public utterances were transcribed into standard orthography. 
Indistinguishable utterances were represented by a line, whoso length 
might or might not have an impressionistic relationship to length of 
utterance. 

Cless II was transcribed by the same set of secretaries but 
after a two hour session to establish additional conventions. Annota- 
tions or disambiguations to clarify the context of ambiguous utterances 
were to be inserted in slashes, on the assumption that* if they were 
properly marked they could easily be left out. Punctuation was to be 
reduced to a minimum, where possible. For terminals, only period, indicating 
a statement neutral in tone or feeling, question mark to indicate a definite 
question contour, and three dots to mark a suspended or unfinished oral 
sentence were to be used. Quotes were to enclose matter being read out 
loud by an emitter. (The later convention ...£ u9ed by some editors, had 
not yet been established. The practice of indicating pauses was also 
not standardised until the-po9t -editing was well under way.) 

The determination of the length of the tape segments contrasted 
was calculated from the first phrase for which there was consensus among 
versions to the last event of the shortest transcript, a total, for 
Class I, of seven minutes of real time. The initial segment, on which 
this common point was established is represented on page one of the hand- 
out. The shortest transcript was that of c, which had one indistinguishable 
utterance of the first emitter before the cocoon beginning phrase and none 
at the end of the cocoon transcription. Next was d wl'h two distinguished 
utterances by the first emitter, before the coroon starting point, and 4 
lines (2 emitters) beyond the end point. B began at the coraon point, but 
continued for 9 lines (6 emitters) past the cocoon end. A had one dis- 
tinguishable utterance before the first, and 11 lines (6 emitters) after 
the end point of the cocoon transcription. 
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The corresponding transcription segments were compared for gross 
characteristics which are often used to describe manuscripts: number of 

lines, sentences, words, emitter types and word totals. Page and line 
ore obviously inadequate descriptive categories for manuscripts unl<=s3 
they are all mode on the same size paper with the same size type. The 
typed abed transcripts were, but some of the other transcripts in this 
study differed in line spacing, size of paper, and size of graphic symbols. 
(The handwritten copies had either bigger or smaller symbols than those 
which were typed.) 

Number of sentences for the manuscripts was similar, but on close 
examination the content of the terminals were found to contrast sharply. 

In the tally for potential sentences, 91 potentials, or 4 more than the 
greatest number of sentences indicated by any one transcriber were 
postulated, but night vary for each analyst contrasting the scripts. 

The fifth original transcript was made by secretary b, an arbitrary 
choice, for both Classes. All transcripts but II c and d were then edited 
by e team of individuals listed on page 8 of the handout. 

The first four unedited or raw transcripts for Class I were 
extensively contrasted with each other for differences in major block 
categories of Emitter, Annotation, Punctuation and Utterance. These 
ver- subdivided further according to a hlerarchial code devised to prepare 
the data for ultimate input into a. computer, where the long lists of 
idiosyncratic and other deviations might be tallied with greater ultimate 
east , or at least accuracy. 

1'ron the raw transcripts for Class I a handwritten transcription^) 
was calculated based upon majority agreement of processors and contextual 
fit. This calculation was done by a naive editor, that is to say a non- 
linguist, relatively unsophisticated female white sophomore, it was edited 
against the tape by the same person, then re-edited by myself and a 
sophisticated, acute, male white sophomore. The second transcription 

by b, (P) was edited four times against the tape. An 

exhaustive chart was drawn to align the manuscripts, And nake some hand 
tabulations, by a linguistically naive female white senior, with apparently 
good Judgment. 

For some of the contrasts, more than one chart was drawn and 
compered. G, n, and n all worked on contrasting some of the versions. 
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